Search CORE

8 research outputs found

Neural Named Entity Recognition from Subword Units

Author: Abujabal Abdalghani
Gaspers Judith
Publication venue: 'International Speech Communication Association'
Publication date: 22/09/2019
Field of study

Named entity recognition (NER) is a vital task in spoken language understanding, which aims to identify mentions of named entities in text e.g., from transcribed speech. Existing neural models for NER rely mostly on dedicated word-level representations, which suffer from two main shortcomings. First, the vocabulary size is large, yielding large memory requirements and training time. Second, these models are not able to learn morphological or phonological representations. To remedy the above shortcomings, we adopt a neural solution based on bidirectional LSTMs and conditional random fields, where we rely on subword units, namely characters, phonemes, and bytes. For each word in an utterance, our model learns a representation from each of the subword units. We conducted experiments in a real-world large-scale setting for the use case of a voice-controlled device covering four languages with up to 5.5M utterances per language. Our experiments show that (1) with increasing training data, performance of models trained solely on subword units becomes closer to that of models with dedicated word-level embeddings (91.35 vs 93.92 F1 for English), while using a much smaller vocabulary size (332 vs 74K), (2) subword units enhance models with dedicated word-level embeddings, and (3) combining different subword units improves performance.Comment: 5 pages, INTERSPEECH 201

arXiv.org e-Print Archive

Crossref

ComQA: A Community-sourced Dataset for Complex Factoid Question Answering with Paraphrase Clusters

Author: Abujabal Abdalghani
Roy Rishiraj Saha
Weikum Gerhard
Yahya Mohamed
Publication venue
Publication date: 01/01/2018
Field of study

To bridge the gap between the capabilities of the state-of-the-art in factoid question answering (QA) and what users ask, we need large datasets of real user questions that capture the various question phenomena users are interested in, and the diverse ways in which these questions are formulated. We introduce ComQA, a large dataset of real user questions that exhibit different challenging aspects such as compositionality, temporal reasoning, and comparisons. ComQA questions come from the WikiAnswers community QA platform, which typically contains questions that are not satisfactorily answerable by existing search engine technology. Through a large crowdsourcing effort, we clean the question dataset, group questions into paraphrase clusters, and annotate clusters with their answers. ComQA contains 11,214 questions grouped into 4,834 paraphrase clusters. We detail the process of constructing ComQA, including the measures taken to ensure its high quality while making effective use of crowdsourcing. We also present an extensive analysis of the dataset and the results achieved by state-of-the-art systems on ComQA, demonstrating that our dataset can be a driver of future research on QA.Comment: 11 pages, NAACL 201

arXiv.org e-Print Archive

MPG.PuRe

Query-Driven On-The-Fly Knowledge Base Construction

Author: Abujabal Abdalghani
Nguyen Dat Ba
Theobald Martin
Tran Khanh
Weikum Gerhard
Publication venue
Publication date: 01/01/2017
Field of study

Today's openly available knowledge bases, such as DBpedia, Yago, Wikidata or Freebase, capture billions of facts about the world's entities. However, even the largest among these (i) are still limited in up-to-date coverage of what happens in the real world, and (ii) miss out on many relevant predicates that precisely capture the wide variety of relationships among entities. To overcome both of these limitations, we propose a novel approach to build on-the-fly knowledge bases in a query-driven manner. Our system, called QKBfly, supports analysts and journalists as well as question answering on emerging topics, by dynamically acquiring relevant facts as timely and comprehensively as possible. QKBfly is based on a semantic-graph representation of sentences, by which we perform three key IE tasks, namely named-entity disambiguation, co-reference resolution and relation extraction , in a light-weight and integrated manner. In contrast to Open IE, our output is canonicalized. In contrast to traditional IE, we capture more predicates, including ternary and higher-arity ones. Our experiments demonstrate that QKBfly can build high-quality, on-the-fly knowledge bases that can readily be deployed, e.g., for the task of ad-hoc question answering. </jats:p

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

Open Repository and Bibliography - Luxembourg

MPG.PuRe

Automated Template Generation for Question Answering over Knowledge Graphs

Author: Abujabal Abdalghani
Riedewald Mirek
Weikum Gerhard
Yahya Mohamed
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2017
Field of study

CISPA – Helmholtz-Zentrum für Informationssicherheit

MPG.PuRe

FINET: Context-Aware Fine-Grained Named Entity Typing

Author: Abujabal Abdalghani
Corro Luciano Del
Gemulla Rainer
Weikum Gerhard
Publication venue
Publication date: 01/01/2015
Field of study

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

MAnnheim DOCument Server

MPG.PuRe

Question Answering over Curated and Open Web Sources

Author: Abujabal Abdalghani
Agichtein Eugene
Anand Avishek
Dasigi Pradeep
Denny
Dua Dheeru
Ferrucci David
Green Bert F
Guo Daya
Hu Sen
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

The last few years have seen an explosion of research on the topic of automated question answering (QA), spanning the communities of information retrieval, natural language processing, and artificial intelligence. This tutorial would cover the highlights of this really active period of growth for QA to give the audience a grasp over the families of algorithms that are currently being used. We partition research contributions by the underlying source from where answers are retrieved: curated knowledge graphs, unstructured text, or hybrid corpora. We choose this dimension of partitioning as it is the most discriminative when it comes to algorithm design. Other key dimensions are covered within each sub-topic: like the complexity of questions addressed, and degrees of explainability and interactivity introduced in the systems. We would conclude the tutorial with the most promising emerging trends in the expanse of QA, that would help new entrants into this field make the best decisions to take the community forward. Much has changed in the community since the last tutorial on QA in SIGIR 2016, and we believe that this timely overview will indeed benefit a large number of conference participants.Comment: SIGIR 2020 Tutoria

arXiv.org e-Print Archive

Crossref

MPG.PuRe

B reak

Crossref